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Abstract —  The  end-to-end  nature  of  Internet  congestion  control  is  an  im¬ 
portant  factor  in  its  scalability  and  robustness.  However,  end-to-end  con¬ 
gestion  control  algorithms  alone  are  incapable  of  preventing  the  congestion 
collapse  and  unfair  bandwidth  allocations  created  by  applications  which  are 
unresponsive  to  network  congestion.  In  this  paper,  we  propose  and  investigate 
a  new  congestion  avoidance  mechanism  called  Network  Border  Patrol  (NBP). 
NBP  relies  on  the  exchange  of  feedback  between  routers  at  the  borders  of 
a  network  in  order  to  detect  and  restrict  unresponsive  traffic  flows  before 
they  enter  the  network.  The  NBP  mechanism  is  compliant  with  the  Internet 
philosophy  of  pushing  complexity  toward  the  edges  of  the  network  whenever 
possible.  Simulation  results  show  that  NBP  effectively  eliminates  congestion 
collapse,  and  that,  when  combined  with  fair  queueing,  NBP  achieves  approx¬ 
imately  max-min  fair  bandwidth  allocations  for  competing  network  flows. 
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I.  Introduction 

THE  essential  philosophy  behind  the  Internet  is  expressed  by 
the  scalability  argument:  no  protocol,  algorithm  or  service 
should  be  introduced  into  the  Internet  if  it  does  not  scale  well.  A 
key  corollary  to  the  scalability  argument  is  the  end-to-end  argu¬ 
ment:  to  maintain  scalability,  algorithmic  complexity  should  be 
pushed  to  the  edges  of  the  network  whenever  possible.  Perhaps 
the  best  example  of  the  Internet  philosophy  is  TCP  congestion 
control,  which  is  achieved  primarily  through  algorithms  imple¬ 
mented  at  end  systems.  Unfortunately,  TCP  congestion  control 
also  illustrates  some  of  the  shortcomings  of  the  end-to-end  argu¬ 
ment. 

As  a  result  of  its  strict  adherence  to  end-to-end  congestion 
control,  the  current  Internet  suffers  from  two  maladies:  conges¬ 
tion  collapse  from  undelivered  packets,  and  unfair  allocations  of 
bandwidth  between  competing  traffic  flows.  The  first  malady — 
congestion  collapse  from  undelivered  packets — arises  when  band¬ 
width  is  continuously  consumed  by  packets  that  are  dropped  be¬ 
fore  reaching  their  ultimate  destinations  [1],  Unresponsive  flows,1 
which  are  becoming  increasingly  prevalent  in  the  Internet  as  net¬ 
work  applications  using  audio  and  video  become  more  popular, 
are  the  primary  cause  of  this  type  of  congestion  collapse,  and  the 
Internet  currently  has  no  way  of  effectively  regulating  them. 
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1  An  unresponsive  flow  is  any  flow  generated  by  an  application  that  fails  to  re¬ 
duce  its  transmission  rate  in  response  to  increased  packet  discarding  caused  by 
congestion. 
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The  second  malady — unfair  bandwidth  allocation — arises  in 
the  Internet  for  a  variety  of  reasons,  one  of  which  is  the  presence 
of  unresponsive  flows.  Adaptive  flows  (e.g.,  TCP  flows)  that  re¬ 
spond  to  congestion  by  rapidly  reducing  their  transmission  rates 
are  likely  to  receive  unfairly  small  bandwidth  allocations  when 
competing  with  unresponsive  or  malicious  flows.  The  Internet 
protocols  themselves  also  introduce  unfairness.  The  TCP  algo¬ 
rithm,  for  instance,  inherently  causes  each  TCP  flow  to  receive  a 
bandwidth  that  is  inversely  proportional  to  its  round  trip  time  [2], 
Hence,  TCP  connections  with  short  round  trip  times  may  receive 
unfairly  large  allocations  of  network  bandwidth  when  compared 
to  connections  with  longer  round  trip  times. 

These  maladies — congestion  collapse  from  undelivered  packets 
and  unfair  bandwidth  allocations — have  not  gone  unrecognized. 
Some  have  argued  that  they  may  be  mitigated  through  the  use  of 
improved  packet  scheduling  [3]  or  queue  management  [4]  mech¬ 
anisms  in  network  routers.  For  instance,  per-flow  packet  schedul¬ 
ing  mechanisms  like  Weighted  Fair  Queueing  (WFQ)  [5],  [6]  at¬ 
tempt  to  offer  fair  allocations  of  bandwidth  to  flows  contending  for 
the  same  link.  So  does  Core-Stateless  Fair  Queueing  (CSFQ)  [7], 
an  approximation  of  WFQ  that  requires  only  edge  routers  to  main¬ 
tain  per-flow  state.  Active  queue  management  mechanisms  like 
Fair  Random  Early  Detection  (FRED)  [8]  achieve  an  effect  simi¬ 
lar  to  fair  queueing  by  discarding  packets  from  flows  that  are  us¬ 
ing  more  than  their  fair  share  of  a  link's  bandwidth.  All  of  these 
mechanisms  are  more  complex  and  expensive  to  implement  than 
simple  FIFO  queueing,  but  they  reduce  the  causes  of  unfairness 
and  congestion  collapse  in  the  Internet.  Nevertheless,  they  do  not 
eradicate  them.  For  illustration  of  this  fact,  consider  the  example 
shown  in  Figure  1 .  In  this  example,  two  unresponsive  flows  com¬ 
pete  for  bandwidth  in  a  network  containing  two  bottleneck  links 
arbitrated  by  a  fair  queueing  mechanism.  At  the  first  bottleneck 
link  (Ri-Ro),  fair  queueing  ensures  that  each  flow  receives  half  of 
the  link’s  available  bandwidth  (750  kbps).  On  the  second  bottle¬ 
neck  link  (R  i  -S4),  much  of  the  traffic  from  flow  B  is  discarded  due 
to  the  link's  limited  capacity  (128  kbps).  Hence,  flow  A  achieves 
a  throughput  of  750  kbps  and  flow  B  achieves  a  throughput  of  128 
kbps.  Clearly,  congestion  collapse  has  occurred,  because  flow 
B  packets,  which  are  ultimately  discarded  on  the  second  bottle¬ 
neck  link,  unnecessarily  limit  the  throughput  of  flow  A  across  the 
first  bottleneck  link.  Furthermore,  while  both  flows  receive  equal 
bandwidth  allocations  on  the  first  bottleneck  link,  their  allocations 
are  not  globally  max-min  fair.2  A  globally  max-min  fair  alloca- 

2  An  allocation  of  bandwidth  is  said  to  be  globally  max-min  fair  if,  at  every  link. 
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Flow  A 


Fig.  1 .  Example  of  a  network  which  experiences  congestion  collapse 


tion  of  bandwidth  would  have  been  1.372  Mbps  for  flow  A  and 
128  kbps  for  flow  B. 

This  example,  which  is  a  variant  of  an  example  presented  in  [1], 
illustrates  the  inability  of  local  scheduling  mechanisms,  such  as 
WFQ,  to  eliminate  congestion  collapse  and  achieve  global  max- 
min  fairness  without  the  assistance  of  additional  network  mecha¬ 
nisms. 

Jain  el  al.  have  proposed  several  rate  control  algorithms  that  are 
able  to  prevent  congestion  collapse  and  provide  global  max-min 
fairness  to  competing  flows  [10].  These  algorithms  (e.g.,  ERICA, 
ERICA+)  are  designed  for  the  ATM  Available  Bit  Rate  (ABR)  ser¬ 
vice  and  require  all  network  switches  to  compute  fair  allocations 
of  bandwidth  among  competing  connections.  However,  these  al¬ 
gorithms  are  not  easily  tailorable  to  the  current  Internet,  because 
they  violate  the  Internet  design  philosophy  of  keeping  router  im¬ 
plementations  simple  and  pushing  complexity  to  the  edges  of  the 
network. 

Floyd  and  Fall  have  approached  the  problem  of  congestion  col¬ 
lapse  by  proposing  low-complexity  router  mechanisms  that  pro¬ 
mote  the  use  of  adaptive  or  “TCP-friendly”  end-to-end  conges¬ 
tion  control  [  1  ] .  Their  suggested  approach  requires  selected  gate¬ 
way  routers  to  monitor  high-bandwidth  flows  in  order  to  deter¬ 
mine  whether  they  are  responsive  to  congestion.  Flows  that  are 
determined  to  be  unresponsive  are  penalized  by  a  higher  packet 
discarding  rate  at  the  gateway  router.  A  limitation  of  this  approach 
is  that  the  procedures  currently  available  to  identify  unresponsive 
flows  are  not  always  successful  [7], 

In  this  paper,  we  introduce  and  investigate  a  new  Internet  traffic 
control  mechanism  called  Network  Border  Patrol  (NBP).  The  ba¬ 
sic  principle  of  NBP  is  to  compare,  at  the  borders  of  the  network, 
the  rates  at  which  each  flow’s  packets  are  entering  and  leaving  the 
network.  If  packets  are  entering  the  network  faster  than  they  are 
leaving  it,  then  the  network  is  very  likely  to  be  buffering  or,  worse 
yet,  discarding  the  flow’s  packets.  In  other  words,  the  network  is 
receiving  more  packets  than  it  can  handle.  NBP  prevents  this  sce¬ 
nario  by  “patrolling”  the  network’s  borders,  ensuring  that  pack¬ 
ets  do  not  enter  the  network  at  a  rate  greater  than  they  are  able 
to  leave  it.  This  has  the  beneficial  effect  of  preventing  conges¬ 
tion  collapse  from  undelivered  packets,  because  an  unresponsive 
flow’s  otherwise  undeliverable  packets  never  enter  the  network  in 
the  first  place. 

NBP’s  prevention  of  congestion  collapse  comes  at  the  expense 
of  some  additional  network  complexity,  since  routers  at  the  bor¬ 
ders  of  the  network  (i.e.,  edge  routers)  are  expected  to  monitor 
and  control  the  rates  of  individual  flows.  NBP  also  introduces  an 

all  active  flows  not  bottlenecked  at  another  link  are  allocated  a  maximum,  equal 
share  of  the  link’s  remaining  bandwidth  [9]. 


Fig.  2.  The  core-stateless  Internet  architecture  assumed  by  NBP 


added  communication  overhead,  since  in  order  for  an  edge  router 
to  know  the  rate  at  which  its  packets  are  leaving  the  network,  it 
must  exchange  feedback  with  other  edge  routers.  However,  un¬ 
like  other  existing  approaches  to  the  problem  of  congestion  col¬ 
lapse,  NBP’s  added  complexity  is  isolated  to  edge  routers;  routers 
within  the  core  of  the  network  remain  unchanged.  Moreover,  end 
systems  operate  in  total  ignorance  of  the  fact  that  NBP  is  imple¬ 
mented  in  the  network,  so  no  changes  to  transport  protocols  are 
necessary. 

Note  that  the  primary  goal  of  NBP  is  to  prevent  congestion  col¬ 
lapse  from  undelivered  packets.  On  its  own,  NBP  cannot  provide 
global  max-min  fairness  to  competing  network  flows.  Neverthe¬ 
less,  when  combined  with  fair  queueing  at  core  routers,  NBP  can 
achieve  approximate  global  max-min  fairness,  as  we  will  show 
later  in  this  paper. 

The  remainder  of  this  paper  is  organized  as  follows.  In  section 
II,  we  describe  the  architectural  components  of  the  Network  Bor¬ 
der  Patrol  mechanism  in  further  detail  and  present  the  feedback 
and  rate  control  algorithms  used  by  NBP  edge  routers  to  prevent 
congestion  collapse.  In  section  III,  we  present  the  results  of  sev¬ 
eral  simulations,  which  illustrate  the  ability  of  NBP  to  avoid  con¬ 
gestion  collapse  and,  when  combined  with  a  fair  queueing  algo¬ 
rithm  in  core  routers,  to  provide  global  max-min  fairness  to  com¬ 
peting  network  flows.  In  section  IV,  we  discuss  several  imple¬ 
mentation  and  scalability  issues  that  must  be  addressed  in  order 
to  make  deployment  of  NBP  feasible  in  the  Internet.  Finally,  in 
section  V  we  provide  some  concluding  remarks. 

II.  Network  Border  Patrol 

Network  Border  Patrol  is  a  core-stateless  congestion  avoid¬ 
ance  mechanism.  That  is,  it  is  aligned  with  the  core-stateless  ap¬ 
proach  [7],  which  allows  routers  on  the  borders  (or  edges)  of  a 
network  to  perform  flow  classification  and  maintain  per-flow  state 
but  does  not  allow  routers  at  the  core  of  the  network  to  do  so. 
Figure  2  illustrates  this  architecture.  In  this  paper,  we  draw  a  fur¬ 
ther  distinction  between  two  types  of  edge  routers.  Depending 
on  which  flow  it  is  operating  on,  an  edge  router  may  be  viewed 
as  an  ingress  or  an  egress  router.  An  edge  router  operating  on  a 
flow  passing  into  a  network  is  called  an  ingress  router,  whereas  an 
edge  router  operating  on  a  flow  passing  out  of  a  network  is  called 


Fig.  3.  An  input  port  of  an  NBP  egress  router 


an  egress  router.  Note  that  a  flow  may  pass  through  more  than  one 
egress  (or  ingress)  router  if  the  end-to-end  path  crosses  multiple 
networks. 

NBP  prevents  congestion  collapse  through  a  combination  of 
per-flow  rate  monitoring  at  egress  routers  and  per-flow  rate  control 
at  ingress  routers.  Rate  monitoring  allows  an  egress  router  to  de¬ 
termine  how  rapidly  each  flow’s  packets  are  leaving  the  network, 
whereas  rate  control  allows  an  ingress  router  to  police  the  rate  at 
which  each  flow’s  packets  enter  the  network.  Linking  these  two 
functions  together  are  the  feedback  packets  exchanged  between 
ingress  and  egress  routers;  ingress  routers  send  egress  routers  for¬ 
ward  feedback  packets  to  inform  them  about  the  flows  that  are  be¬ 
ing  rate  controlled,  and  egress  routers  send  ingress  routers  back¬ 
ward  feedback  packets  to  inform  them  about  the  rates  at  which 
each  flow’s  packets  are  leaving  the  network. 

This  section  describes  three  important  aspects  of  the  NBP 
mechanism:  (1)  the  architectural  components,  namely  the  mod¬ 
ified  edge  routers,  which  must  be  present  in  the  network,  (2)  the 
feedback  control  algorithm,  which  determines  how  and  when  in¬ 
formation  is  exchanged  between  edge  routers,  and  (3)  the  rate 
control  algorithm,  which  uses  the  information  carried  in  feedback 
packets  to  regulate  flow  transmission  rates  and  thereby  prevent 
congestion  collapse  in  the  network. 

A.  Architectural  Components 

The  only  components  of  the  network  that  require  modification 
by  NBP  are  edge  routers.  The  input  ports  of  egress  routers  must  be 
modified  to  perform  per-flow  monitoring  of  bit  rates,  and  the  out¬ 
put  ports  of  ingress  routers  must  be  modified  to  perform  per-flow 
rate  control.  In  addition,  both  the  ingress  and  the  egress  routers 
must  be  modified  to  exchange  and  handle  feedback. 

Figure  3  illustrates  the  architecture  of  an  NBP  egress  router’s 
input  port.  Packets  sent  by  ingress  routers  arrive  at  the  input  port 
of  the  egress  router  and  are  first  classified  by  flow.  In  the  case 
of  IPv6,  this  is  done  by  examining  the  packet  header’s  flow  label, 
whereas  in  the  case  of  IPv4,  it  is  done  by  examining  the  packet’s 
source  and  destination  addresses  and  port  numbers.  Each  flow’s 
bit  rate  is  then  rate  monitored  using  a  rate  estimation  algorithm 
such  as  the  Time  Sliding  Window  (TSW)  [11].  These  rates  are 
collected  by  a  feedback  controller,  which  returns  them  in  back¬ 
ward  feedback  packets  to  an  ingress  router  whenever  a  forward 
feedback  packet  arrives  from  that  ingress  router.  In  some  cases. 


Fig.  4.  An  output  poll;  of  an  NBP  ingress  router 

to  be  described  later  in  this  section,  backward  feedback  packets 
are  also  generated  asynchronously;  that  is,  an  egress  router  sends 
them  to  an  ingress  router  without  first  waiting  for  a  forward  feed¬ 
back  packet. 

The  output  ports  of  NBP  ingress  routers  are  also  enhanced. 
Each  contains  a  flow  classifier,  per-flow  traffic  shapers  (e.g.,  leaky 
buckets),  a  feedback  controller,  and  a  rate  controller.  See  Fig¬ 
ure  4.  The  flow  classifier  classifies  packets  into  flows,  and  the  traf¬ 
fic  shapers  limit  the  rates  at  which  packets  from  individual  flows 
enter  the  network.  The  feedback  controller  receives  backward 
feedback  packets  returning  from  egress  routers  and  passes  their 
contents  to  the  rate  controller.  It  also  generates  forward  feedback 
packets,  which  it  periodically  transmits  to  the  network’s  egress 
routers.  The  rate  controller  adjusts  traffic  shaper  parameters  ac¬ 
cording  to  a  TCP-like  rate  control  algorithm,  which  is  described 
later  in  this  section. 

B.  The  Feedback  Control  Algorithm 

The  NBP  feedback  control  algorithm  determines  how  and  when 
feedback  packets  are  exchanged  between  edge  routers.  Feed¬ 
back  packets  take  the  form  of  ICMP  packets  and  are  necessary 
in  NBP  for  three  reasons.  First,  they  allow  egress  routers  to  dis¬ 
cover  which  ingress  routers  are  acting  as  sources  for  each  of  the 
flows  they  are  monitoring.  Second,  they  allow  egress  routers  to 
communicate  per-flow  bit  rates  to  ingress  routers.  Third,  they  al¬ 
low  ingress  routers  to  detect  network  congestion  and  control  their 
feedback  generation  intervals  by  estimating  edge-to-edge  round 
trip  times. 

The  contents  of  NBP  feedback  packets  are  shown  in  Figure  5. 
Contained  within  the  forward  feedback  packet  is  a  time  stamp  and 
a  list  of  flow  specifications3  for  flows  originating  at  the  ingress 
router.  The  time  stamp  is  used  to  calculate  the  round  trip  time 
between  two  edge  routers,  and  the  list  of  flow  specifications  indi¬ 
cates  to  an  egress  router  the  identities  of  active  flows  originating 
at  the  ingress  router.  (An  edge  router  adds  a  flow  to  its  list  of  ac¬ 
tive  flows  whenever  a  packet  from  a  new  flow  arrives;  it  removes 
a  flow  when  the  flow  becomes  inactive.)  In  the  event  that  the  net- 

3  A  flow  specification  is  a  value  uniquely  identifying  a  flow.  In  IPv6  it  is  the 
flow’s  flow  label.  In  IPv4,  it  is  the  combination  of  source  address,  destination 
address,  source  port  number,  and  destination  port  number. 
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Fig.  5.  Forward  and  backward  feedback  packets  exchanged  by  edge  routers 


work’s  maximum  transmission  unit  size  is  not  sufficient  to  hold  an 
entire  list  of  flow  specifications,  multiple  forward  feedback  pack¬ 
ets  are  used. 

When  an  egress  router  receives  a  forward  feedback  packet,  it 
immediately  generates  a  backward  feedback  packet  and  returns 
it  to  the  ingress  router.  Contained  within  the  backward  feedback 
packet  are  the  forward  feedback  packet’s  original  time  stamp,  a 
router  hop  count,  and  a  list  of  observed  bit  rates,  called  egress 
rates,  collected  by  the  egress  router  for  each  flow  listed  in  the 
forward  feedback  packet.  The  router  hop  count,  which  is  used 
by  the  ingress  router’s  rate  control  algorithm,  indicates  how  many 
routers  are  in  the  path  between  the  ingress  and  the  egress  router. 
The  egress  router  determines  the  hop  count  by  examining  the  time 
to  live  (TTL)  field  of  arriving  forward  feedback  packets.  When  the 
backward  feedback  packet  arrives  at  the  ingress  router,  its  contents 
are  passed  to  the  ingress  router’s  rate  controller,  which  uses  them 
to  adjust  the  parameters  of  each  flow’s  traffic  shaper. 

In  order  to  determine  how  often  to  generate  forward  feedback 
packets,  an  ingress  router  keeps,  for  each  egress  router,  a  timer 
which  determines  the  frequency  of  forward  feedback  packet  gen¬ 
eration.  To  maintain  an  adequate  and  consistent  feedback  up¬ 
date  interval,  the  timer  repeatedly  expires  after  an  interval  of  time 
known  as  the  base  round  trip  time.  The  base  round  trip  time 
for  egress  router  e,  denoted  e.baseRTT,  is  defined  as  the  shortest 
observed  round  trip  time  between  the  ingress  router  and  egress 
router  e,  and  it  generally  reflects  the  round  trip  time  between 
the  two  edge  routers  when  the  network  is  not  congested.  The 
value  e.baseRTT  is  calculated  by  estimating  the  current  round  trip 
time  from  each  arriving  backward  feedback  packet  and  updating 
e.baseRTT  whenever  the  current  round  trip  time  is  less. 

Egress  routers  may  also  generate  backward  feedback  packets 
asynchronously.  If  an  egress  router  does  not  receive  a  forward 
feedback  packet  from  an  ingress  router  within  a  fixed  interval  of 
time  (denoted  Asynchlnten’al),  it  generates  and  transmits  a  back¬ 
ward  feedback  packet  to  the  ingress  router.  Asynchronously  gen¬ 
erated  backward  feedback  packets  are  specially  marked  by  the 


on  arrival  of  BF  packet  p  from  egress  router  e 
if  (p. asynchronous  ==  FALSE) 

e.currentRTT  =  curjtime  -  p.timestamp; 
if  ( e.currentRTT  <  e.baseRTT) 
e.baseRTT  =  e.currentRTT', 
deitaRTT  =  e.currentRTT  -  e.baseRTT', 
for  each  flow /listed  in  p 

fmrc  =  min  (MSS  /  e.currentRTT,  f.egress_rate  /  MF); 
if  (f.phase  ==  SLOW_START) 

if  (deitaRTT xf.ingress_rate  <  MSS  X  e.hopcount) 
f.ingress_rate  =  f. ingress _rate  x  2; 

else 

f.phase  =  CONG_AVOID; 
if  (f.phase  ==  CONG_AVOID) 

if  (deitaRTT  xf.ingress_rate  <  MSS  X  e.hopcount) 
f. ingress _rate  —  f.ingress_rate  +  fmrc', 
else 

f.ingress_rate  =f.egress_rate  -  fmrc', 
else  /*  p. asynchronous  ==  TRUE  */ 
for  each  flow/listed  in  p 

if  (f.phase  ==  SLOW_START) 

if  (f.ingress_rate  >  f. egress _rate  x  8) 
f.ingress_rate  =  f. egress _rate  -  fmrc, 
f.phase  =  CON  G_  A  V  OID ; 
else  I*  f.phase  ==  CONG_AVOID  */ 

if  (f.ingress_rate  >  f. egress _rate  +  3  xf.mrc) 
f.ingress_rate  =  f.egress_rate  -  fmrc. 

Fig.  6.  Pseudocode  for  ingress  router  rate  control  algorithm 


egress  router  and  are  not  used  by  the  ingress  router  to  update  the 
round  trip  time  measurement.  The  reason  for  asynchronous  back¬ 
ward  feedback  packet  generation  is  to  prevent  the  squelching  of 
congestion  feedback  when  forward  feedback  packets  are  delayed 
or  dropped  by  the  network.  It  also  ensures  that  ingress  routers  re¬ 
ceive  frequent  rate  feedback  and  are  able  to  respond  to  congestion 
even  when  the  distance  between  edge  routers  is  very  large. 

C.  The  Rate  Control  Algorithm 

The  NBP  rate  control  algorithm  regulates  the  rate  at  which  each 
flow  enters  the  network.  Its  primary  goal  is  to  converge  on  a  set  of 
per-flow  transmission  rates  (hereinafter  called  ingress  rates )  that 
prevents  congestion  collapse  from  undelivered  packets.  It  also 
attempts  to  lead  the  network  to  a  state  of  maximum  link  utilization 
and  low  router  buffer  occupancies,  and  it  does  this  in  a  manner  that 
is  similar  to  TCR 

In  the  NBP  rate  control  algorithm,  shown  in  Figure  6,  a  flow 
may  be  in  one  of  two  phases,  slow  start  or  congestion  avoidance, 
which  are  similar  to  the  phases  of  TCP  congestion  control.  New 
flows  enter  the  network  in  the  slow  start  phase  and  proceed  to  the 
congestion  avoidance  phase  only  after  the  flow  has  experienced 
congestion.  The  rate  control  algorithm  is  invoked  whenever  a 
backward  feedback  (BF)  packet  arrives  at  an  ingress  router.  Re¬ 
call  that  egress  routers  send  two  types  of  BF  packets  to  ingress 
routers:  normal  BF  packets,  which  are  generated  when  an  egress 
router  receives  a  forward  feedback  (FF)  packet,  and  asynchronous 
BF  packets,  which  egress  routers  generate  without  any  prompting 


from  an  ingress  router.  Both  types  of  BF  packets  contain  a  list  of 
flows  arriving  at  the  egress  router  from  the  ingress  router  as  well 
as  the  monitored  egress  rate  for  each  flow.  However,  only  nor¬ 
mal  BF  packets  contain  meaningful  time  stamps  which  are  copied 
from  arriving  FF  packets. 

If  the  arriving  BF  packet  is  a  normal  BF  packet,  then  the  algo¬ 
rithm  calculates  the  current  round  trip  time  and  updates  the  base 
round  trip  time,  if  necessary.  It  then  calculates  deltaRTT,  which  is 
the  difference  between  the  current  round  trip  time  ( e.currentRTT) 
and  the  base  round  trip  time  ( e.baseRTT ).  A  deltaRTT  value 
greater  than  zero  indicates  that  packets  are  requiring  a  longer  time 
to  traverse  the  network  than  they  once  did,  and  this  can  only  be 
due  to  the  buffering  of  packets  within  the  network. 

NBP’s  rate  control  algorithm  decides  that  a  flow  is  experiencing 
congestion  whenever  it  estimates  that  the  network  has  buffered  the 
equivalent  of  more  than  one  of  the  flow’s  packets  at  each  router 
hop.  To  do  this,  the  algorithm  first  computes  the  product  of  the 
flow’s  ingress  rate  and  deltaRTT.  This  value  provides  an  estimate 
of  the  amount  of  flow  data  that  is  buffered  somewhere  in  the  net¬ 
work.  If  it  is  greater  than  the  number  of  router  hops  between  the 
ingress  and  the  egress  router  multiplied  by  the  size  of  the  largest 
possible  packet,  then  the  flow  is  considered  to  be  experiencing 
congestion.  The  rationale  for  determining  congestion  in  this  way 
is  to  maintain  both  high  link  utilization  and  low  queueing  delay. 
Ensuring  there  is  always  at  least  one  packet  buffered  for  transmis¬ 
sion  on  a  network  link  is  the  simplest  way  to  achieve  full  utiliza¬ 
tion  of  the  link,  and  deciding  that  congestion  exists  when  more 
than  one  packet  is  buffered  at  the  link  keeps  queueing  delays  low. 

When  the  rate  control  algorithm  determines  that  a  flow  is  not 
experiencing  congestion,  it  increases  the  flow’s  ingress  rate.  If  the 
flow  is  in  the  slow  start  phase,  its  ingress  rate  is  doubled.  Dou¬ 
bling  the  ingress  rate  allows  a  new  flow  to  rapidly  capture  avail¬ 
able  bandwidth  if  the  network  is  underutilized.  If  the  flow  is  in 
the  congestion  avoidance  phase,  its  ingress  rate  is  conservatively 
incremented  by  a  minimum  rate  change  (MRC)  value  in  order  to 
avoid  the  creation  of  congestion.  MRC  is  computed  as  the  maxi¬ 
mum  segment  size  divided  by  the  current  round  trip  time  between 
the  edge  routers.  This  results  in  rate  growth  behavior  that  is  simi¬ 
lar  to  TCP  in  its  congestion  avoidance  phase.  Furthermore,  MRC 
is  not  allowed  to  exceed  the  flow’s  current  egress  rate  divided  by  a 
constant  factor  (MF).  This  guarantees  that  rate  increments  are  not 
excessively  large  when  the  round  trip  time  is  small. 

When  the  rate  control  algorithm  determines  that  a  flow  is  expe¬ 
riencing  congestion,  it  reduces  the  flow’s  ingress  rate.  If  a  flow  is 
in  the  slow  start  phase,  it  enters  the  congestion  avoidance  phase. 
If  a  flow  is  already  in  the  congestion  avoidance  phase,  its  ingress 
rate  is  reduced  to  the  flow’s  egress  rate  decremented  by  MRC.  In 
other  words,  an  observation  of  congestion  forces  the  ingress  router 
to  send  the  flow’s  packets  into  the  network  at  a  rate  slightly  lower 
than  the  rate  at  which  they  are  leaving  the  network. 

The  actions  described  above  are  taken  only  when  a  normal  BF 
packet  arrives  at  an  ingress  router.  A  different  set  of  actions  is 
taken  when  an  asynchronous  BF  packet  arrives.  This  is  because, 
unlike  normal  BF  packets,  asynchronous  BF  packets  are  not  gen¬ 
erated  in  response  to  FF  packets  and  thus  do  not  carry  meaning¬ 
ful  time  stamps.  Therefore,  the  congestion  status  of  the  network 


Simulation  parameter 

Value 

Packet  size 

1 000  bytes 

Router  queue  size 

1 00  packets 

Maximum  segment  size  (MSS) 

1500  bytes 

TCP  implementation 

Reno  [12] 

TCP  window  size 

1 00  kbytes 

MRC  factor  (MF) 

10 

Asynchlnterval 

10  msec 

TSW  window  size 

10  msec 

End-system-to-edge  propagation  delay 

100  f-isec 

End-system-to-edge  link  bandwidth 

10  Mbps 

Table  1 .  Default  simulation  parameters 


cannot  be  determined  through  the  use  of  round  trip  time  measure¬ 
ments.  Instead,  it  is  determined  by  comparing  a  flow’s  ingress 
and  egress  rates.  In  the  slow  start  phase,  a  flow  is  considered  to 
be  experiencing  congestion  when  its  current  ingress  rate  exceeds 
its  reported  egress  rate  by  a  factor  of  eight.  The  reason  for  the 
choice  of  the  value  eight  is  that  we  found  a  delay  of  three  round 
trip  times  is  typically  required  for  a  change  in  the  ingress  rate  to 
be  fully  reflected  in  the  egress  rate  of  a  backward  feedback  packet. 
During  this  time,  the  flow  may  double  its  ingress  rate  three  times, 
increasing  it  by  at  most  a  factor  of  eight.  Similarly,  in  the  con¬ 
gestion  avoidance  phase,  a  flow  is  considered  to  be  experiencing 
congestion  whenever  its  current  ingress  rate  exceeds  its  reported 
egress  rate  by  three  MRC  increments.  The  reasoning  in  this  case 
is  similar  to  the  reasoning  used  in  the  slow  start  case,  except  that 
a  flow  in  the  congestion  avoidance  phase  may  only  increase  its 
ingress  rate  by  at  most  three  MRC  increments  during  three  round 
trip  times. 

Clearly,  the  steps  taken  to  determine  congestion  when  an  asyn¬ 
chronous  BF  packet  arrives  are  more  tolerant  of  transient  conges¬ 
tion  than  the  steps  taken  to  determine  congestion  when  a  normal 
BF  packet  arrives.  This  is  because  asynchronous  BF  packets  are 
only  meant  to  be  used  as  a  stopgap  measure  to  prevent  serious 
congestion  from  developing  during  the  interval  between  normal 
BF  packet  arrivals. 

III.  Simulation  Experiments 

In  this  section,  we  present  the  results  of  several  simulation  ex¬ 
periments,  each  of  which  is  designed  to  test  a  different  aspect  of 
Network  Border  Patrol.  The  first  set  of  experiments  examines  the 
ability  of  NBP  to  prevent  congestion  collapse,  and  the  second  set 
of  experiments  examines  its  ability  to  provide  fair  bandwidth  al¬ 
locations  to  competing  network  flows.  All  simulations  were  run 
for  100  seconds  using  the  UC  Berkeley  /LB  NL/VINT  ns-2  simu¬ 
lator  [13],  The  ns-2  code  implementing  NBP  and  the  scripts  to 
run  these  simulations  are  available  at  the  UCI  Network  Research 
Group  web  site  [14].  Default  simulation  parameters  are  shown  in 
Table  1 .  They  are  set  to  values  commonly  used  in  the  Internet  and 
are  used  in  all  simulation  experiments  unless  otherwise  noted. 

A.  Preventing  Congestion  Collapse 

The  first  set  of  simulation  experiments  explores  NBP’s  ability 
to  prevent  congestion  collapse  from  undelivered  packets.  In  the 
first  experiment,  we  study  the  scenario  depicted  in  Figure  7.  One 
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Fig.  9.  A  network  with  multiple  congested  router  hops 


Fig.  7.  A  network  with  a  single  shared  link 
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Fig.  8.  Congestion  collapse  observed  as  unresponsive  traffic  load  increases.  The 
solid  line  shows  the  combined  throughput  delivered  by  the  network. 


flow  is  a  TCP  flow  generated  by  an  application  which  always  has 
data  to  send,  and  the  other  flow  is  an  unresponsive  constant  bit 
rate  UDP  flow.  Both  flows  compete  for  access  to  a  shared  1.5 
Mbps  bottleneck  link  (Ri-Ro),  and  only  the  UDP  flow  traverses 
a  second  bottleneck  link  (Ro-Eo),  which  has  a  limited  capacity  of 
128  kbps. 

Figure  8  shows  the  throughput  achieved  by  the  two  flows  as 
the  UDP  source’s  transmission  rate  is  increased  from  32  kbps  to 
2  Mbps.  The  combined  throughput  delivered  by  the  network  (i.e., 
the  sum  of  both  flow  throughputs)  is  also  shown.  Three  different 
cases  are  examined  under  this  scenario.  The  first  is  the  benchmark 
case  used  for  comparison:  NBP  is  not  used  between  edge  routers. 


and  all  routers  schedule  the  delivery  of  packets  on  a  FIFO  basis. 
As  Figure  8(a)  shows,  the  network  experiences  severe  congestion 
collapse  as  the  UDP  flow’s  transmission  rate  increases,  because 
the  UDP  flow  fails  to  respond  adaptively  to  the  discarding  of  its 
packets  on  the  second  bottleneck  link.  When  the  UDP  load  in¬ 
creases  to  1.5  Mbps,  the  TCP  flow’s  throughput  drops  nearly  to 
zero.  In  the  second  case,  weighted  fair  queueing  replaces  FIFO 
queueing  in  each  of  the  routers,  and  the  result,  shown  in  Fig¬ 
ure  8(b),  is  better  throughput  for  the  TCP  flow.  However,  as 
indicated  by  the  combined  throughput  of  both  flows,  congestion 
collapse  still  occurs  as  the  UDP  load  increases.  Although  WFQ 
allocates  750  kbps  to  both  flows  at  the  first  bottleneck  link,  only 
128  kbps  of  this  bandwidth  is  successfully  exploited  by  the  UDP 
flow,  which  is  even  more  seriously  bottlenecked  by  a  second  link. 
The  remaining  622  kbps  is  wasted  on  undelivered  packets.  In  the 
third  case,  FIFO  queues  are  reintroduced,  and  NBP  is  installed  in 
the  edge  routers.  As  Figure  8(c)  shows,  NBP  effectively  elimi¬ 
nates  congestion  collapse;  the  TCP  flow  achieves  a  nearly  optimal 
throughput  of  1.37  Mbps,  and  the  combined  throughput  remains 
very  close  to  1.5  Mbps. 

In  the  second  experiment,  we  examine  whether  these  positive 
results  continue  to  be  demonstrated  when  a  TCP  flow  traverses 
several  bottleneck  links  carrying  traffic  from  unresponsive  UDP 
flows.  The  simulation  model  for  this  experiment  is  shown  in  Fig¬ 
ure  9.  In  this  configuration,  a  TCP  flow  shares  several  1.5  Mbps 
bottleneck  links  with  unresponsive  UDP  flows,  each  of  which 
is  further  bottlenecked  by  another  link  with  a  capacity  of  128 
kbps.  All  links  have  propagation  delays  of  10  msec,  and  the  UDP 
sources  each  transmit  packets  at  a  constant  rate  of  1  Mbps. 

Figure  10  shows  the  throughput  of  the  TCP  flow  as  the  number 
of  congested  router  hops  increases  from  1  to  10.  When  only  FIFO 
scheduling  is  used,  the  TCP  flow  achieves  a  throughput  of  approx¬ 
imately  0.5  Mbps  regardless  of  the  number  of  hops,  whereas  NBP 
allows  the  network  to  avoid  congestion  collapse,  allocating  nearly 
1.31  Mbps  to  the  TCP  flow  when  the  number  of  hops  is  small.  As 
the  number  of  hops  increases,  the  throughput  of  the  TCP  flow  di¬ 
minishes  somewhat  due  to  increased  feedback  delays  between  the 
TCP  flow’s  edge  routers. 

B.  Achieving  Fairness 

The  primary  goal  of  NBP  is  to  prevent  congestion  collapse  from 
occurring.  However,  its  secondary  goal  is  to  improve  the  fairness 
of  bandwidth  allocations  to  competing  network  flows.  In  this  sec¬ 
ond  set  of  simulation  experiments,  we  examine  whether  NBP  can 
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Fig.  10.  TCP  throughput  in  a  network  with  multiple  congested  router  hops 
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(b)  Moderate  unfairness  using  NBP  with  FIFO 
Fig.  1 1 .  Unfairness  as  the  unresponsive  traffic  load  increases 

achieve  fair  bandwidth  allocations  on  its  own,  and,  if  not,  whether 
it  can  do  so  in  conjunction  with  other  common  network  protocols 
and  mechanisms. 

In  the  first  fairness  experiment,  we  consider  only  one  cause  of 
unfairness:  the  existence  of  unresponsive  flows.  We  return  to  the 
scenario  depicted  in  Figure  7  but  replace  the  second  bottleneck 
link  (R2-E2)  with  a  higher  capacity  10Mbps  link.  TheTCP  flow  is 
generated  by  an  application  which  always  has  data  to  send,  and  the 
UDP  flow  is  generated  by  an  unresponsive  source  which  transmits 
packets  at  a  constant  bit  rate. 

Since  there  is  only  one  1 .5  Mbps  bottleneck  link  (Ri  -R2)  in  this 
scenario,  the  max-min  fair  allocation  of  bandwidth  between  the 
flows  is  750  kbps  (if  the  UDP  source  exceeds  a  transmission  rate 
of  750  kbps).  However,  as  Figure  1 1(a)  shows,  fairness  is  clearly 
not  achieved  when  only  FIFO  scheduling  is  used  in  routers.  As 
the  unresponsive  UDP  traffic  load  increases,  the  TCP  flow  ex¬ 
periences  congestion  and  reduces  its  transmission  rate,  thereby 
granting  an  unfairly  large  amount  of  bandwidth  to  the  unrespon¬ 
sive  UDP  flow.  Thus,  although  there  is  no  congestion  collapse 
from  undelivered  packets,  there  is  clearly  unfairness.  Figure  11(b) 
shows  the  throughput  of  each  flow  when  NBP  is  introduced.  No¬ 
tice  that  NBP  is  able  to  reduce  the  amount  of  unfairness  observed 
with  FIFO  scheduling  only,  but  it  does  not  completely  eliminate 


TCP1  round  trip  time  (sec) 


(a)  Severe  congestion  collapse  using  FIFO  only 


0.02  0.04  0.06  0.08  0.1 

TCP1  round  trip  time  (sec) 


(b)  Good  fairness  with  congestion  collapse  using  WFQ  only 
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(c)  Slight  unfairness  but  no  congestion  collapse  using  NBP  with  FIFO 
Fig.  12.  Unfairness  as  the  TCP  round  trip  time  increases 

unfairness.  This  is  due  to  the  fact  that  NBP  has  no  mechanism  that 
explicitly  enforces  fairness. 

In  the  second  fairness  experiment  we  consider  another  cause  of 
unfairness:  TCP’s  dependence  on  the  round  trip  time.  In  order  to 
study  this  type  of  unfairness,  we  reuse  the  scenario  from  the  first 
fairness  experiment,  but  we  return  the  second  bottleneck  link  ca¬ 
pacity  to  128  kbps  and  introduce  a  new  TCP  flow  (TCP2)  between 
,S'2  and  S3 .  Thus,  two  TCP  flows  and  one  unresponsive  UDP  flow 
share  the  first  bottleneck  link  (Ri-R2),  and  only  the  UDP  flow 
crosses  the  second  bottleneck  link  (R2-E2).  In  order  to  study  the 
impact  of  increasing  round  trip  times  on  fairness,  the  round  trip 
time  of  the  original  TCP  flow  (TCP1)  is  varied  by  changing  the 
propagation  delay  of  link  Ii  -Ri .  All  other  link  propagation  delays 
remain  fixed  as  shown  in  Figure  7,  and  the  transmission  rate  of  the 
UDP  source  is  set  to  1.5  Mbps. 

Figure  12(a)  shows  the  resulting  throughput  of  each  flow  when 
FIFO  scheduling  is  used  in  all  routers.  Congestion  collapse  oc¬ 
curs  to  such  an  extent  that  both  TCP  flows  achieve  throughputs 
of  zero,  regardless  of  the  round  trip  time  of  the  TCP1  flow.  Fig¬ 
ure  12(b)  depicts  the  throughput  of  each  flow  when  FIFO  schedul¬ 
ing  is  replaced  with  WFQ  at  all  routers.  WFQ  allows  the  flows 
to  achieve  perfectly  fair  allocations  of  the  bottleneck  link  band¬ 
width,  but  it  does  not  prevent  congestion  collapse,  as  indicated  by 
the  fact  that  the  combined  throughput  is  less  than  1 .5  Mbps.  Fig¬ 
ure  12(c)  shows  the  throughput  of  each  flow  when  NBP  is  com- 


D  EE  F  H  H  AAA  CCC  GGGGG  GGBBB 


Fig.  13.  The  GFC-2  network 
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(a)  Using  NBP  with  WFQ 


Flow 

Group 

Ideal  global 
max-min 
fair  share 

Simulation  results 

Throughput 

using 

WFQ  only 

Throughput 

using  NBP 
with  FIFO 

Throughput 

using  NBP 
with  WFQ 

Throughput 

using  NBP 
with  CSFQ 

A 

10 

8.32 

10.96 

10.00 

10.40 

B 

5 

5.04 

1.84 

5.04 

4.48 

C 

35 

27.12 

31.28 

34.23 

31.52 

D 

35 

16.64 

33.84 

34.95 

32.88 

E 

35 

16.64 

37.76 

34.87 

33.36 

F 

10 

8.32 

7.60 

10.08 

8.08 

G 

5 

4.96 

1.04 

4.96 

5.28 

H 

52.5 

36.15 

46.87 

50.47 

47.76 

Table  2.  Per-flow  throughput  in  the  GFC-2  network  (in  Mbps) 


bined  with  FIFO  scheduling.  Although  the  combined  throughput 
is  very  close  to  1.5  Mbps  and  congestion  collapse  is  prevented, 
NBP  does  not  completely  eliminate  the  unfair  bandwidth  alloca¬ 
tions  created  by  TCPl’s  longer  round  trip  time. 

In  the  third  and  final  fairness  experiment,  we  study  whether 
NBP  can  be  made  more  fair  by  combining  it  with  a  fair  queueing 
mechanism  such  as  weighted  fair  queueing  or  core-stateless  fair 
queueing.  We  consider  the  network  model  shown  in  Figure  13. 
This  model  is  adapted  from  the  second  General  Fairness  Configu¬ 
ration  (GFC-2),  which  is  specifically  designed  to  test  the  max-min 
fairness  of  traffic  control  algorithms  [15].  It  consists  of  22  unre¬ 
sponsive  UDP  flows,  each  generated  by  a  source  transmitting  at 
a  constant  bit  rate  of  100  Mbps.  Flows  belong  to  flow  groups 
which  are  labeled  from  A  to  H,  and  the  network  is  designed  in 
such  a  way  that  members  of  each  flow  group  receive  the  same 
max-min  bandwidth  allocations.  Links  connecting  core  routers 
serve  as  bottlenecks  for  at  least  one  of  the  22  flows,  and  all  links 
have  propagation  delays  of  5  msec  and  bandwidths  of  150  Mbps 
unless  otherwise  shown  in  the  figure. 

The  first  column  of  Table  2  lists  the  global  max-min  fair  share 
allocations  for  all  flows  shown  in  Figure  13.  These  values  repre¬ 
sent  the  ideal  bandwidth  allocations  for  any  traffic  control  mech¬ 
anism  that  attempts  to  provide  global  max-min  fairness.  The  re¬ 
maining  columns  list  the  equilibrium-state  throughputs  actually 
observed  after  4.5  seconds  of  simulation  for  several  scenarios. 
(Only  the  results  for  a  single  member  of  each  flow  group  are 
shown.)  In  the  first  scenario,  NBP  is  not  used  and  all  routers  per¬ 
form  WFQ.  As  indicated  by  comparing  the  values  in  the  first  and 
second  columns,  WFQ  by  itself  is  not  able  to  achieve  global  max- 
min  fairness  for  all  flows.  This  is  due  to  the  fact  that  WFQ  does 
not  prevent  congestion  collapse.  In  the  second  scenario,  NBP  is 
introduced  at  edge  routers  and  FIFO  scheduling  is  assumed  at  all 
routers.  Results  listed  in  the  third  column  show  that  NBP  with 
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(b)  Using  NBP  with  CSFQ 
Fig.  14.  Per-flow  throughput  in  the  GFC-2  network 

FIFO  also  fails  to  achieve  global  max-min  fairness  in  the  GFC-2 
network,  largely  because  NBP  has  no  mechanism  to  explicitly  en¬ 
force  fairness.  In  the  third  and  fourth  simulation  scenarios,  NBP 
is  combined  with  WFQ  and  CSFQ,  respectively,  and  in  both  cases 
NBP  is  able  to  achieve  bandwidth  allocations  that  are  approxi¬ 
mately  max-min  fair  for  all  flows. 

NBP  with  WFQ  achieves  slightly  better  fairness  than  NBP  with 
CSFQ.  We  suspect  two  reasons  for  this  fact.  First,  CSFQ  is  an 
approximation  of  WFQ,  and  its  performance  depends  on  the  accu¬ 
racy  of  its  estimation  of  a  flow’s  input  rate  and  fair  share.  Second, 
CSFQ’s  fairness  mechanism  engages  only  when  congestion  is  de¬ 
tected  (i.e.,  when  a  router’s  buffer  occupancy  becomes  sufficiently 
large).  Since  NBP  keeps  buffer  occupancies  low  by  continuously 
monitoring  and  responding  to  variations  in  the  edge-to-edge  round 
trip  time,  CSFQ  is  not  given  many  opportunities  to  engage. 

Figures  14(a)  and  14(b)  show  how  rapidly  the  throughput  of 
each  flow  converges  to  its  max-min  fair  bandwidth  allocation  for 
the  NBP  with  WFQ  and  the  NBP  with  CSFQ  cases,  respectively. 
Even  in  a  complex  network  like  the  one  simulated  here,  all  flows 
converge  to  an  approximately  max-min  fair  bandwidth  allocation 
within  one  second. 

IV.  Implementation  Issues 

As  we  saw  in  the  previous  section.  Network  Border  Patrol 
is  a  congestion  avoidance  mechanism  that  effectively  prevents 
congestion  collapse  and  provides  approximate  max-min  fairness 
when  used  with  a  fair  queueing  mechanism.  However,  a  num- 


ber  of  important  implementation  issues  must  be  addressed  before 
NBP  can  be  feasibly  deployed  in  the  Internet.  Among  these  issues 
are  the  following: 

1 .  Scalable  flow  classification.  Perhaps  the  biggest  impediment 
to  NBP's  scalability  is  its  reliance  upon  flow  classification  at  edge 
routers.  In  a  network  with  a  large  number  of  flows,  the  overheads 
of  maintaining  per-flow  state,  communicating  per-flow  feedback, 
and  performing  per-flow  rate  control  and  rate  monitoring  may  be¬ 
come  inordinately  expensive.  Fortunately,  it  is  possible  to  address 
this  concern  by  classifying  flows  more  coarsely  at  edge  routers. 
Instead  of  classifying  a  flow  using  the  packet’s  addresses  and  port 
numbers,  the  network’s  edge  routers  may  aggregate  many  flows 
together  by,  for  instance,  classifying  flows  using  only  the  packet’s 
address  fields.  Alternatively,  they  might  choose  to  classify  flows 
even  more  coarsely  using  only  the  packet’s  destination  network 
address.  Coarse-grained  flow  aggregation  has  the  effect  of  signif¬ 
icantly  reducing  the  number  of  flows  seen  by  NBP  edge  routers. 
However,  its  drawback  is  that  adaptive  flows  aggregated  with  un¬ 
responsive  flows  may  be  indiscriminately  punished  by  an  ingress 
router.  Hence,  NBP  flow  aggregation  creates  a  trade-off  between 
scalability  and  per-flow  fairness. 

2.  Scalable  inter-domain  deployment.  Another  approach  to  im¬ 
proving  the  scalability  of  NBP,  inspired  by  a  suggestion  in  [7],  is 
to  develop  trust  relationships  between  domains  that  deploy  NBP. 
The  inter-domain  router  connecting  two  or  more  mutually  trust¬ 
ing  domains  may  then  become  a  simple  NBP  core  router  with 
no  need  to  perform  per-flow  tasks  or  keep  per-flow  state.  If  a 
trust  relationship  cannot  be  established,  border  routers  between 
the  two  domains  may  exchange  congestion  information,  so  that 
congestion  collapse  can  be  prevented  not  only  within  a  domain, 
but  throughout  multiple  domains. 

3.  Scalable  fairness.  Although  simulation  results  show  that  NBP 
is  able  to  achieve  the  best  approximation  to  max-min  fairness 
when  it  is  combined  with  WFQ,  WFQ  requires  that  core  routers 
perform  per-flow  operations,  making  it  less  scalable  than  CSFQ. 
In  networks  where  only  a  moderate  number  of  simultaneous  flows 
is  possible  (e.g.,  a  campus  network),  NBP  with  WFQ  may  be 
preferable  for  its  better  fairness.  However,  NBP  with  CSFQ  is 
preferable  in  networks  with  a  large  number  of  flows  since  approx¬ 
imate  global  max-min  fairness  is  achieved  in  a  more  scalable  core¬ 
stateless  fashion. 

4.  Incremental  deployment.  It  is  crucial  that  NBP  be  implemented 
in  all  edge  routers  of  an  NBP-capable  network.  If  one  ingress 
router  fails  to  police  arriving  traffic  or  one  egress  router  fails  to 
monitor  departing  traffic,  NBP  will  not  operate  correctly  and  con¬ 
gestion  collapse  will  be  possible.  Nevertheless,  it  is  not  necessary 
for  all  networks  in  the  Internet  to  deploy  NBP  in  order  for  it  to 
be  effective.  Any  network  that  deploys  NBP  will  enjoy  the  bene¬ 
fits  of  eliminated  congestion  collapse  within  the  network.  Hence, 
it  is  possible  to  incrementally  deploy  NBP  into  the  Internet  on  a 
network-by-network  basis. 

5.  Multicast.  Multicast  routing  makes  it  possible  for  copies  of  a 
flow’s  packets  to  leave  the  network  through  more  than  one  egress 
router.  When  this  occurs,  an  NBP  ingress  router  must  examine 
backward  feedback  packets  returning  from  each  of  the  multicast 


flow’s  egress  routers.  To  determine  whether  the  multicast  flow  is 
experiencing  congestion,  the  ingress  router  should  execute  its  rate 
control  algorithm  using  backward  feedback  packets  from  the  most 
congested  ingress-to-egress  path  (i.e.,  the  one  with  the  lowest  flow 
egress  rate).  This  has  the  effect  of  limiting  the  ingress  rate  of  a 
multicast  flow  according  to  the  most  congested  link  in  the  flow’s 
multicast  tree. 

6.  Multi-path  routing.  Multi-path  routing  makes  it  possible  for 
packets  from  a  single  flow  to  leave  the  network  through  differ¬ 
ent  egress  routers.  In  order  to  support  this  possibility,  an  NBP 
ingress  router  may  need  to  examine  backward  feedback  packets 
from  more  than  one  egress  router  in  order  to  determine  the  com¬ 
bined  egress  rate  for  a  single  flow.  For  a  flow  passing  through 
more  than  one  egress  router,  its  combined  egress  rate  is  equal  to 
the  sum  of  the  flow’s  egress  rates  reported  in  backward  feedback 
packets  from  each  egress  router. 

7.  Integrated  or  differentiated  sendees.  NBP  treats  all  flows  iden¬ 
tically,  but  integrated  and  differentiated  services  networks  allow 
flows  to  receive  different  qualities  of  service.  In  such  networks, 
NBP  should  be  used  to  regulate  best  effort  flows  only.  Flows  us¬ 
ing  network  services  other  than  best  effort  are  likely  to  be  policed 
by  separate  traffic  control  mechanisms. 

V.  Conclusion 

In  this  paper,  we  have  presented  a  novel  congestion  avoidance 
mechanism  for  the  Internet  called  Network  Border  Patrol.  Unlike 
existing  Internet  congestion  control  approaches,  which  rely  solely 
on  end-to-end  control,  NBP  is  able  to  prevent  congestion  collapse 
from  undelivered  packets.  It  does  this  by  ensuring  at  the  border 
of  the  network  that  each  flow's  packets  do  not  enter  the  network 
faster  than  they  are  able  to  leave  it.  NBP  requires  no  modifications 
to  core  routers  nor  to  end  systems.  Only  edge  routers  are  enhanced 
so  that  they  can  perform  the  requisite  per-flow  monitoring,  per- 
flow  rate  control  and  feedback  exchange  operations. 

Extensive  simulation  results  provided  in  this  paper  show  that 
NBP  successfully  prevents  congestion  collapse  from  undelivered 
packets.  They  also  show  that,  while  NBP  is  unable  to  eliminate 
unfairness  on  its  own,  it  is  able  to  achieve  approximate  global 
max-min  fairness  for  competing  network  flows  when  combined 
with  a  fair  queueing  mechanism  such  as  WFQ.  Furthermore,  NBP, 
when  combined  with  CSFQ,  approximates  global  max-min  fair¬ 
ness  in  a  completely  core-stateless  fashion. 

As  in  any  feedback-based  traffic  control  mechanism,  stability  is 
an  important  performance  concern  in  NBP.  Using  techniques  de¬ 
scribed  in  [16],  we  plan  as  part  of  our  future  work  to  perform  an 
analytical  study  of  NBP’s  stability  and  convergence  toward  max- 
min  fairness.  Preliminary  results  already  suggest  that  NBP  bene¬ 
fits  greatly  from  its  use  of  explicit  rate  feedback,  which  prevents 
rate  over-corrections  in  response  to  indications  of  network  con¬ 
gestion. 
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